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Abstract 

Regular vine distributions which constitute a flexible class of multivariate de- 
pendence models are discussed. Since multivariate copulae constructed through 
pair-copula decompositions were introduced to the statistical community, inter- 
est in these models has been growing steadily and they are finding successful 
applications in various fields. Research so far has however been concentrating 
on so-called canonical and D-vine copulae, which are more restrictive cases of 
regular vine copulae. It is shown how to evaluate the density of arbitrary regular 
vine specifications. This opens the vine copula methodology to the flexible mod- 
eling of complex dependencies even in larger dimensions. In this regard, a new 
automated model selection and estimation technique based on graph theoretical 
considerations is presented. This comprehensive search strategy is evaluated in 
a large simulation study and applied to a 16-dimensional financial data set of 
international equity, fixed income and commodity indices which were observed 
over the last decade, in particular during the recent financial crisis. The analysis 
provides economically well interpretable results and interesting insights into the 
dependence structure among these indices. 

Keywords: minimum spanning tree, model selection, multivariate copula, 
regular vines 



1. Introduction 

The most popular statistical dependence model is the multivariate Gaus- 
sian distribution. Ho wever there is a g rowing demand for non-Gaussian models 



especially in finance (ICherubini et al.l l2004i ) but also in climate research (e.g 



Scholzel and Friederichd (l2008|) ) , environmental s ciences ( Salvadori et al.l ( 2007 ) 
and Kazianka and Pilj (l 201llii. med i cine ( e.g., Beaudoin and Lakhal-Chaiebl 



( 2008h ) and physics (e.g.. Sato et al. ( 2010[ )) to name a few areas. With the 
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availability of large samples of multivariate data it is possible to investigate 
non-Gaussian dependency models and to estimate par amete r s effic iently. The 
backbone for such models is the famous theorem by ISklail (|1959[ ). which al- 
lows to construct general multivariate distributions from copulae and marginal 
distributions. The specification of the copula can be done independently from 
th e margins. Whi le there is a multitude of bivariate copulae (see the books 
of Joe ( 1997 ) and iNelsenI (2006)), the class of multivariate copulae was quite 
restricted until recently. Especially two co p ula classes received attention, the 



class of elliptical copulae ( Fang et al. J 2002l ). lFrahm et al. I (120031) ) and the class 



of Archimedean copulae ( Nelsenl . 2005 ). Typical elliptical copulae are the sym- 
metric Gaussian and Student-t copulae (see for example iDemarta and McNeil 
(2Q0i)), while the class of Archimedean copulae includes the tail-asymmetric 
Clayton and Gumbel copulae. 

For financial applications a flexible modeling of tails is vital to assess the 



most c ommon risk measure Value-at-Risk (VaR) (for a definition see lMcNeil et al 



(|2005l) ). . n particular the Gaussi an copula does not allow for heavy tails and the 
approach suggested by |l| ( 2000l) was bla med by many for contributing to the 
recent financial crisis rsee lSalmonI (l2009l) ). This shows that there is a growing 
need for more flexible copulae. While the Student-t copula allows for symmetric 
tail dependence as measured b y the tail depen dence coefficient or tail depen- 
dence function (see for example Joe et al. ( 2010[ )') it has only a single parameter 
to control tail dependence of all pairs of variables. Standard Archimedean mul- 
tivariate copulae may be tail-asymmetric, but are governed only by a single 
parame ter. There has been effort to exte nd th e class of Ar chimedean copu- 
lae (seeH (|l997t ). ISavu and Tredd (|2010l ). and|Hofert| (|201ll )). however these 
models require additional paramete r restrictions. 

These problems were noted by lAas et al. I (l2009l) . who started to utilize a 
wider class of multivariate copulae. This class is constructed using only bivari- 
ate copula specifications as dependency models for the distribution of certain 
pairs of variables conditional on a specified set of variables. These independent 
building blocks are called pair-copulae an d were used to construct multivariate 
distributions. This approach dates back toljod (Il996l) a nd was investigated and 
organized systematically by [Bedford and Cookd (|200ll 2002). The identifica- 
tion of the needed pairs of variables and their corresponding set of conditioning 
variables is facilitated by a sequence of trees (see for example Chapter 4 of 
Kurowicka and Cooke ( 2006[ )). They called these trees regular vines (R- vines) 



and the corresponding multivariate distribution an R-vine distribution. For an 
n-dimensional R-vine distribution, the first tree identifies n — 1 pairs of vari- 
ables, whose distribution is modeled directly. The second tree identifies n — 2 
pairs of variables, whose distribution conditional on a single variable is modeled 
by a pair-copula. The conditioning variable is also determined in the second 
tree. The next tree again identifies pairs of variables, whose conditional distri- 
bution is specified by a pair-copula. Here the conditioning set has dimension 2 
and is also determined. Proceeding in this way the last tree determines a single 
pair of variables, whose distribution conditional on all remaining variables is de- 
fined by a last pair-copula. Recent developments and applications are discussed 
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Kurowicka and Jo3 ( 2011 ). Czadol (20101 ) prov ides a current survey about 



these statistical model classes and Joe et al. ( 2010t ) investigate and discuss tail 
de pendence propert ies of vine distributions. 

lAas et al. I (l2009l) popularized two subclasses of regular vines, canonical vines 
(C- vines) and drawable vines (D- vines). C- vines possess star structures in their 
tree sequence, while D- vines have path structures. Kurowicka and Cookd ( 2006 ) 



focused on vine distributions with Gaussian pair-copulae, but lAas et al 



20091) 



allowed for different pair-copula families, such as the bivariate Student-t copula, 
bivariate Gumbel and bivariate Clayton c opula. While D-vine based models are 
starte d to be used in many a pplications (Fischer et a l. (2009V Min and Czado 
(l2010l).IChollete et al.l (l2009l) . Hofmann a nd CzadJH oiOl.Mendes et all (l2010l) . 



Salinas-Gutierrez et al.l (|2010r ). lErdorf et al. (2011), Mercier and FrisonI [20091) 



Smith et all (l2010l)l . C-vi n es are less commonly used |Heinen and Valdesogol 



(|2009l) . ICzado et al. I (|2010l )): lNikoloulopoulos" et al.l (|2012D consider both classes. 

Estimation in C- and D-vine copula models is often facilitated using max- 
imum likelihood. Since this will require optimization with respect to at least 
n(n — l)/2 parameters, it is important to provide good starting values for the 
optimization. For this purp ose a fast s equen tial estimation procedure was sug- 
gested and im plement ed in Aa s et al. ( 2009() . whose asymptotic properties are 
investigated in Hobaek Hafl (,20 111) . Since bootstrapping or inversion of high di- 
mensional Hessian matrices are required to obtain inter val estimates, Bayesian 
approaches have been followed for parameter estimation ( Min and Czado . 2010l) 
and p air-c qpula selection in s pecified D-vine copula models (|Min and Czado 
(1201 ll) and lSmith etHI ^01^). 

However the class of R-vine distributions is much larger than the class of 
D- and C-vine distributions and currently there are very few applications of 
R- vines. One reason for this is the enorm ous number of possible R-vine tree 
sequences (see Morales-Napoles et al.l ( 2010l )) t o choose from. The import ance of 
a good selection choice has also been noted bv lGarcia and Tsafack ( 20091 ). This 
provides the starting point of this paper. We develop an automated strategy 
of jointly searching for an appropriate R-vine tree structure, the pair-copula 
families and the parameter values of the chosen pair-copula families. It is a 
sequential approach starting by identifying the first tree, its pair-copula families 
and estimating their parameters. Based on this the specification of the second 
tree utilizes transformed variables. The applied transformations depend on the 
choices made in the first tree. In this manner all trees together with their 
choice of pair-copula families and corresponding parameters are made. For each 
tree selection we use a maximum spanning tree algorithm, where edge weights 
are chosen appropriately to reflect large dependencies. Pair-cop ulae are chose n 
independently. Here we use the Akai ke information crit erion (lAkaikel Il973l) . 
which performs well in this context (see Brechmann ( 2010l Chapter 5)). Finally 
the corresponding pair-copula parameter estimation follows the same sequential 
estimation appr oach as suggested for D- and C-vine copula distributions in 



Aas et all (|2009[) 



With this automated search strategy we identify for multivariate data on 
the n-dimensional cube [0,1]" useful multivariate copula models, as we show 
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in a large simulation study and meaningful models arise for the application 
considered later. 

Once an appropriate R-vine distribution is found for a data set we perform 
maximum likelihood estimation for the parameters using the sequential esti- 
mates as starting values. We also like to perform this task in an automated 
setup. This requires an efficient storage of the R-vine tree specification, its 
pair-copula families and the corresponding parameters. This is facilitated in 
a set of lower triangular matrices and we proof how the corresponding joint 
density making up the likelihood can be evaluated recursively. This setup is 
also used to provide an algorithm for simulating from an R-vine distribution. 
Pseudo code for the corresponding algorithms is given. 

Finally we like to note that the developed search strategies are able to work 
not only in an automated fashion but also for higher dimensional problems. 
Before full maximum likelihood estimation was implemented for problems in at 
most 10 dimension. In our 16-dimensional application to financial data we show 
the usefulness of our approach and demonstrate that R-vine distributions pro- 
vide better fit than C- and D-vines for this data set. These results have already 
spawned new research on finding more parsimonious specifications, wh i ch re - 
place higher pair-copulae by independence copulae. See Brechmann et all ( 2012[ ) 
for details. This allows us to extend the implementation to higher dimensions, 
which are especially needed for the risk assessment of larger financial portfolios. 

To summarize, our contributions: We develop novel algorithms for evaluating 
an R-vine density and simulating from specified R- vines. That is we effectively 
provide statistical inference techniques for R-vines. We further propose an in- 
novative R-vine selection and estimation method and thus, for the first time, 
allow to actually select and fit arbitrary non-Gaussian R-vines to data. This is 
exploited to analyze the returns of important financial indices. 

The paper is organized as follows: Section [2] introduces R-vine distributions 
and co pulae. Necessary background from graph theory can be found in iDiestel 
(200i). Then the efficient storage of the R-vine specification and its statistical 
inference are developed. Selection of the R-vine tree structure, the pair-copula 
families and its parameters arc tackled in Section [31 This includes a simulation 
study presented in [Appendix A and shows that the proposed models by the 
search strategy are reasonable. The search and estimation algorithm is then 
successfully applied to a 16-dimensional financial data set involving daily equity, 
fixed income and commodity indices. In addition to sequential estimates full ML 
estimates are also provided. The paper closes with a summary and discussion. 



2. Parametric regular-vine distributions 



2.1. Regular vines 

We begin this section with the theoretical background of a regular vine (R- 
vine), we then give its representation as a matrix and show how the R-vine cop- 
ula density can be written in a convenient way usin g this matrix form. The fo l- 
lowing summarizes some definitions and results from lBedford and Cookd l|200lh . 
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iBedford and Cookd (|2002i Part 4) and iKurowicka and Cookd (|2006l Chapter 
4.4), where a tree is a graph in which each two nodes are connected by a unique 
sequence of edges. 



Definition 2.1 (R-vine). V = (Ti, . . . ,T„_i) is an R-vine on n elements if 

(i) Ti is a tree with nodes Ni ~ {1, . . . , n} and a set of edges denoted Ei. 

(ii) For i — 2, . . . ,n — 1, Ti is a tree with nodes Ni — Ei^i and edge set Ei. 

(iii) For i = 2, . . . , n — 1 and {a, &} £ Ei with a = {ai, 02} and h — {61, 62} 
must hold that ^{aOh) = 1 (proximity condition). 

In other words, an R-vine on n elements is a nested set of n — 1 trees such 
that the edges of tree j become the nodes of tree j + 1. The proximity condition 
insures that two nodes in tree j + 1 are only connected by an edge if these nodes 
share a common node in tree j. We notice that the set of nodes in the first tree 
contains all indices 1, n, while the set of edges is a set of n — 1 pairs of these 
indices. In the second tree the set of nodes contains sets of pairs of indices and 
the set of edges is built of pairs of pairs of indices, etc. 

To further study properties of R- vines we define three sets associated with 
its edges. The complete union of an edge is a set of all indices that this edge 
contains. If two nodes a and b are joined by an edge, then the conditioned and 
conditioning sets of this edge are the symmetric difference and the intersection 
of the complete unions of a and 6, respectively. 

Definition 2.2 (Complete union, conditioning and conditioned sets of an edge). 

The complete union of an edge ei G Ei is the set Ue^ = {rt G Ni\3ej G Ej,j = 
1, . . . , i — 1, with n G ei G 62 G . . . G ei_i G ei} C A^i. For ei — {a, b} G Ei, 
a, 6 G Ei^i, i — 1, . . . , n — 1, the conditioning set of an edge Ci is Dg; = C^a H Uh, 
and the conditioned sets of an edge Ci are Ce^.a — Ua\ Dg., Ce^^b ^ Ub \ D^^ 
and = Cei,a U Cei,b = Ua^Ub, whcrc AAB {A\B)U{B\ A) denotes the 
symmetric difference of two sets. 

The complete union of the edge a between (1, 2) and (2, 3) in tree T2 shown 
in Figure [His {1,2,3}, since for instance 1 G {1,2} G {{1, 2}, {2, 3}} {a,b} 
and 3 G {2, 3} G {{1, 2}, {2, 3}} = {a, 6}, and the complete union of the edge b 
between (2,3) and (3,6) is {2,3,6}. The conditioning and the conditioned sets 
of the edge joining a and b are {2, 3} and {1, 6}, respectively. 

The conditioned and conditioning sets of all edges of V are collected in a set 
called constraint set. Each element of this set is composed of a pair of indices 
corresponding to the conditioned set and a set containing indices corresponding 
to the conditioning set. 

Definition 2.3 (Constraint set). The constraint set for V is a set: 
CV^ {{{Ce,a:Ce,b},D,)\e G E,, € ^ {a, b} , I = 1 , . . . , n - l} . 
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1,2 



2,3 



3,4 




3,5|2 

(2^ 2,4|3 



(Ti) 



1,4|23 1,6|23 2,7|36 

2,4|3 ) ■ (l,3|2J ■ ^2,6|3J ■ 7|6J 

1,5|23 

3,512" 



4,5|123 5,6|123 1,7|236 

1,4|23 ) ■ (l,5|23^) ■ (^,6|23~) ■ (ji,7\36^ 

4,6|1235 5,7|1236 

4,5|123 ) (5,6|123) ■ ( 1,7|236 




(T2 



in) 



Figure 1: An example R-vine on seven variables. At each edge e = {a,b} S Ei, the terms 
Ce,a and Ce 6 are separated by a comma and given to the left of the ' | ' sign, while De appears 
on the right. 



It is convenient to enumerate nodes of the trees in an R-vine using their 
conditioned and conditioning sets. In Figure[T]each edge of the R-vine has been 
assigned with its conditioned sets printed before '|' and the conditioning set 
shown after '|'. Moreover we notice that the constraint set of an R-vine CV 
contains all necessary information needed to distinguish it from other R-vines. 

Two special types of R-vines namely the canonical (C-) and the D-vine have 
been used extensively in the literature. A D-vine is an R-vine for which the first 
tree has nodes with degree two or less (path structure). A C-vine is an R-vine 
which contains a node with maximal degree in each tree (star structure). It is 
convenient to work with these two R-vine types as the first tree (D-vine) and 
the ordering of the root nodes (C-vine) determine their structure complctel^f^ 

R- vines have many interesting prop erties that can be found in iBedford and Cookd 
(I2OO2I ) and lKurowicka and Jod |201l[) . 



2.2. Regular vine copulae 

The graphical structure of R-vines is used to specify necessary copulae for 
a so-called pair-copula construction, where a copula is a multivariate distribu- 
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tion o n the unit hypercub e [0, 1]" with uniform marginal distributions fsee I Joe 



(1993) and lNelseni tOOdi) ). To build an R-vine copula one must specify n — 1 



unconditional bivariate copulae between variables indexed by the conditioned 
sets of the edges in the first tree of the R-vine. For the second tree of the R-vine 
one needs to specify the bivariate copulae between variables indexed by the con- 
ditioned sets conditional on variables indexed by the conditioning sets of edges 
of R-vine. We formally define the R-vine copula specification corresponding to 
an R-vine as in Bedford and Cookd(l2002l) . 



Definition 2.4 (R-vine copula specification). {F,V,B) is an R-vine cop- 
ula specification if F = {Fi, . . . , Fn) is a vector of continuous invertible distribu- 
tion functions, V is an n- dimensional R-vine and B = {i?e|i = 1, . . . , — 1; e G 
Ei] is a set of copulae with being a bivariate copula, a so-called pair-copula. 

A joint distribution F of a random vector {Xi, . . . ,X„) is said to realize 
an R-vine copula specification {F, V, B) or exhibit R-vine dependence if, for 
each e G Ei, i — l,...,n — 1, e — {a, 6}, Be is the bivariate copula of Xc^ „ 
and ^Ce.b given Xjj^ — {Xi\i £ -De}, where it is assumed that t his condi- 



tional copu la is independent of the conditioning variables X (see lAas et al 



(|2009[) and iHobaek Haff et all (|2010f )). We call such a distribution also an R- 



vine distribution. Additionally, the marginal distribution of Xj has to be Fj 
for j = I, . . . ,n. We denote the copula density of the copula Be for the edge 
e = {a,b} as cc,^^,c,,b\D^- 

For the R-vine from Figure [T] we need to assign six unconditional copulae 
Ci,2,C2,3, C3_4, C2,5, C3_6 and cqj in the first tree, five conditional copulae in the 
second tree Ci_3|2, C2_6|3j C3 7|g,C3 5|2 and C2413, etc. All copulae can be of a 
different type and their parameters can be specified independently from each 
other. However, since the copulae specified in a tree will affect the conditioned 
variables used in later trees the choice of the different copulae will infiuence each 
other. 

The density of an R-vine copula specified through as signing appropriate 



bivari ate copulae to edges of the R-vine has been shown in [Bedford and Cooke 



( 200lL 2002) to be equal to the product of conditional and unconditional copulae 



assigned to its edges. 

Theorem 2.5. Let {F, V, B) be an R-vine copula specification on n elements. 
There is a unique distribution F that realizes this R-vine copula specification 
with density 



/i...«(a;) 



n-l 



n fki^k) Y[ n ^C,,o^..C,,t\DS^C,,a\DAxc,.a\xDj,Fc^ ,\DA^C,.b\xDj), 
k=l i=l eeEi 

(1) 

where x — (xi, . . . ,a;„), e — {a, 6} and Xo^ stands for the variables in De, i.e., 
xb^ = {xi\i G De}. Moreover fi denotes the density of Fi for i ~ 1, . . . ,n. 
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Notice that the copulae in ([T]) are indexed by elements of the set CV (see 
Definition 12. 31) . To obtain the conditional distributions ^Ce.al-Oe (^Ce.a I^-D^) and 
t^C^,b\DA^C,JxDj let £'j 3 e = {a,b},a = {ai,a2},b = j&2,^2 > be the edge 
which connects Ce,a with Ce,b given the variables De- IJod showed that 



= ■ HFCa,a, {XC^_^^ \XDa), Fc^.a^, |D„ (^^C^.a^ \x )), 



(2) 



where F^;^ ^j^)^ (a;c„ „ Ja;D„) and ^ba aa l^'a (^Ca.aa I^C'a)) have to be obtained 
recursively as shown in the next section. The notation of the ^.-function is 
introduced for convenience. 

Similarly, we obtain F^^ ^\jj^{xc^,b\xDj- We call F^^ ^\ij^{xc^_Jxd^) and 
Fc^ b|Dc(^Cc blxoj transformed variables. 

For C- and D-vines the density (H]) can be rewritten in a more convenient 
wa y. For more in f ormat i on on how to exploit t he structu re of C- and D-vine s 
see lBerg and Aad (|2009f) . lMin and Czadol (|2010l . 2011) andlCzadoeE^ (|2010f ). 



2.3. Matrix representation of regular vines 

To develop statistical inference algorithms for R-vines we need a convenient 
way of representing an R-vine. Storing the nested set of trees is too expensive 
an d does not allow f o r an e asy way to describe inference algorithms. 

Morales-Napole i (|2008D uses a lower triangular matrix to store an R-vine. 



The idea is to store the constraint set of an R-vine in columns of an rt-dimensional 
lower triangular matrix. We hence specify how the information from the lower 
triangular matrix should be read by defining a constraint set for the matrix. 
In the next section we introduce a way how the structure of R-vine matrices 
can be used to encode corresponding pair-copula types and parameters. While 
Morales-Napoled ( 2008h used the matrix representation of R-vines for counting 



the number of different R-vines, we will subsequently exploit this structure for 
likelihood computation and a sampling procedure. 

Definition 2.6 (Matrix constraint set). Let M — (mij)ij-=i^...„ be a lower 
triangular matrix. The i-th constraint set for M is 

Cmii) = {{{mi^i,mk^i},D)\k = i + 1, . . . ,n, D = {mk+i,i, . . • ,m„,J} (3) 

for i — I, . . . ,n — 1. If k ^ n we set D = 9. The constraint set for matrix M 
is the union CM = Ca/(1) U . . . U Cuin — 1)- For the elements of the constraint 
set {{mi,i,mk.i}, F>) G CM we call {mi,i,mk.i} the conditioned set and D the 
conditioning set. 

Every element of the constraint set is made up of an diagonal entry m^^i, an 
entry in the same column below the diagonal mk,i and all the elements following 
in that column {ruk+i^i, . . ■ , rnn^i}, k = i + 1, . . . ,n, i = 1, . . . , n. 
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To demonstrate this idea, we can compare the constraint sets defined by the 
example matrix M* with the constraint sets of the R-vine in Figure [T] 



M* = 



/ 7 

4 
5 
1 
2 
3 
V 6 



(4) 



2 / 



In the first column of M* we have the diagonal entry mi.i = 7 and the 
element m4_i = 1 in the fourth row. According to the definition above this 
gives ({7, 1}, {2, 3, 6}) e CM* which corresponds to the constraint set of the 
rightmost edge of in the R-vine in Figure [T] 

Before we formally define an R-vine matrix (that will be shown to code all 
information included in an R-vine) we need two sets that will help us characterize 
the matrix form and will ensure the proximity condition required for R-vines 
(see Definition I2.ip . For a lower triangular matrix M = (TOi,j)ij=i,...n set for 



= 1, ...,n - 1, 

Bnii) := {{mi^^,D)\k ^ i + 1, . . . ,n; D = {nik.i, ■ ■ • ,m„,i}}, 
Bnii) {{mk.i,D)\k = i + 1, . . . ,n; D = {irii^i} U {nik+i,i, ■ 

Now we can define an R-vine matrix. 



Definition 2.7 (R-vine matrix). A lower triangular matrix M = (mi j )i j-=i ...„ 
is called an R-vine matrix if for i — I, . . . ,n ~ 1 and for all k = i + 1, . . . , n — 1 
there is an j in i + 1, . . . ,n ~ 1 with 

imk,i,{mk+i,i,...,mn,i}) e BmU) or e BmH)- (5) 
It can be shown that the following two properties follow from 



(i) {r 



T-n,i} C {r 



i„j } for 1 < i < i < 



1. 



(ii) rui^i ^ . . . , nin.i+i} for i 

Condition (i) states that every column contains all the entries that a column 
to the right contains, while condition (ii) assures that there is a new entry on 
the diagonal in every column. Condition ([S]) is the essential counterpart to the 
proximit y condition in the definition of an R-vine (see Definition 12. ip . Note 
that .Morales- Napoled ( 2008 ) used a different condition to ensure the proximity 
condition. 

As an example, one may check that M* given in (|4]) fulfills condition ([5]) and 
is in fact an R-vine-matrix. 

The following simple properties of an R-vine matrix can be seen directly 
from the definition. 



Properties 2.8. (i) All elements in a column are different. 
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(ii) Deleting the first row and column from an n- dimensional R-vine matrix 
gives an {n — I) -dimensional R-vine matrix. 



We have seen that the matrix M* codes ah information needed to represent 
the R-vine in Figure [T] The proof that there is an equivalent R-vine-matrix 
with the same constraint set for every R-vine and vice versa can be found in 
Difimann In the proof it is shown that the constraint set CV of an R-vine 



is in fact equal to the constraint set CM of a corresponding R-vine matrix M . 
Note however that the matrix corresponding to an R-vine is not unique. As a 
simple example consider the matrix obtained after an exchange of the elements 
2 and 3 in the lower right 2 by 2 corner of M* . It defines the same R-vine as 
M*. 



2.4. Evaluation of the joint regular vine density 

We now use the matrix representation for R- vines presented in the previous 
section to make more visible which copulae have to be used to build a density 
of the R-vine distribution. In particular, we provide a novel algorithm on how 
to efficiently evaluate the conditional distribution functions of an arbitrary R- 
vine copula. This is a non-trivial task, since the order of the conditioning 
variables required is not obvious. For this purpose we require an R-vine matrix 
that codes information about conditioned and conditioning variables. Let M = 
('Tii,j)i,j=i,...,Ti be an R-vine matrix corresponding to the R-vine V. 

The R-vine distribution is a product of copulae indexed by CV which is equal 
to CM defined in ([3]). Hence the R-vine distribution density is: 



fl...n ~ 
n 1 fc+1 

n/. n n 

j—1 k—n—1 i—n 

(6) 



where arguments of all functions have been omitted to shorten the notation. 

We now have to show how the conditional distributions which are arguments 
of bivariate copulae in ([6]) are obtained. We will show this in the algorithm below 
where the evaluation of the fully parametrical form of an R-vine distribution 
is described. For this purpose we first need to specify two additional square 
matrices T = (^i j)i j=i,...,n and P ~ {Pi,j)ij=i,...,n that will contain information 
about types and parameters of the bivariate copulae in ^ . 

Since for all j = 1, . . . ,n — I, i — j -\- 1, . . . ,n the entry niij of M codes the 
copula of the variables indexed by nijj and niij conditional on the variables 
indexed by {mi+ij, . . . , m„j} we let tij describe the type of this copula (e.g.. 
Normal, Clayton, etc.) and let pij contain parameters of this copula (note 
that some copulae require more than one parameter; we can store them, e.g., 
in additional matrices or using a multi-dimensional array instead of a matrix). 
An example of such a specification for M* (see (|4])) is shown in Figure [2] 
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5 

7 1 



*2,1 

*3,1 *3,2 



P2,l 

P3,l P3,2 



1 6 2 6 P5,l P5,2 P5,3 P5,4 



3 3 3 2 2 



^6,1 ^6,2 i6,3 t6,4 i5,5 P6,l P6,2 P6,3 P6,4 P6,5 



2 2 ( 6 1 3 3 3 t7,l t7.2 i7,3 I *7,4 ) i7,5 t7,6 P7,l P7,2 P7,3 ( VIA ) P7,5 P7,6 



Figure 2: The copula with conditioned variables indexed by {4, 5} and conditioning variables 
indexed by {1,2,3}, i.e., C4 51123, is of the type t4_i with parameter P4,i. The copula c-j^% is 
of the type t7.4 and has the parameter P7,4. 



Next, we find a recursive algorithm to calculate the conditional distribu- 
tions. For convenience we will assume that the diagonal entries of M are or- 
dered from n to 1, i.e., mk,k = n — fc -|- 1. Note that the reordered matrix is 
equivalent to the original matrix which means it induces the same R-vine but 
with relabeled indices. The copula type and parameter matrices are unaffected 
by this reordering. To proceed, we introduce the maximum matrix of M de- 
noted by M. It is M = {va.i^k)i^k^x,...,n with nii^fc = max{rni^fc, . . . , m„_fc} for 
all /c = 1, . . . , n and i = . . . ,n. In words, m^jt is the maximum of all entries 
in the fc-th column of M from the bottom up to the i-th element. Note that 
mn,fc = for all k — 1, . . . ,n, since m„^fc is the maximum over only one 

element and since the element on the diagonal is a new element in each column, 
it is mfc^fc = Tifc.fc = n — k -\- 1 for all fc = 1, . . . , n. 

Algorithm 12.11 shows how to compute the density for a given R-vine copula 
specification, where h{-, ■\ti^kTPi,k) in Line [T5l denotes the /i-function ^ for the 
copula type ti,k with parameters pi^k and the matrices y<^"^<^'^ and y™direct ^^^^ 
introduced to store the arguments of the bivariate copulae in (|6]), where their 
notation is due to the order of the arguments in Line 15. 

The outer for-loop of the algorithm iterates over the columns of M from 
right to left, starting with n — 1. The inner for-loop iterates over the rows 
from the bottom up to one element below the diagonal entry of M . Therefore, 
Line [14] of Algorithm 12.11 is executed once for every edge of the R-vine with the 
corresponding copula type and parameters. 

Note that we do not need to initialize {v'^f'°''\ wjj^f . . . , v'^^'^"^) because 
it is m„_fc = m„_fc for all fc = 1, . . . , n — 1 and hence, we always select a v^ircct 
in Line [S] for i — n. 

The crucial point in the algorithm is how the conditional distributions that 
are arguments of bivariate copulae in ^ denoted as and zf^ are selected. 

Therefore, we show that = ^;n,.fc|{™,+i.,,...,m„,a(2;mk,fc ^m.+i.^, ■ • ■ , a;m„. J 

and z^j^ = -F'mi_fc|{mi+i,fc,...,m„_fc}(2^mi |2;mi+i_fc , • . • , a^m„_fc) for k = n — 1, . . . , 1 
and i = n, . . . ,k + 1. 
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Algorithm 2.1 Density of an R-vine specification. 



Input: R-vine specification in matrix form, i.e., M , T, P, where nik.k = n ~ 
A; + 1, k — 1, n. 

Output: Density of the R-vine distribution at (xi, . . . x„) for the given R-vine 
specification. 
Set F = 1. 

Let y direct ^ (i;di«et|i, = 1, . . . , n). 
Let y indirect ^ (^indirect fc = 1, . . . , n). 

Set «rS<rS'- • ■ = (F„(a;„),F„_i(a;„_i), . . . 

Let M = (mi,fe|i,/c = with mi_fc = max{mi,fe, . . . , m„^fc} for all 

A: = 1, . . . , n and i — k, . . . ,n. 

for A; = 71 — 1, . . . , 1 do {Iteration over the columns of M} 
for i = n, . . . , k + 1 do {Iteration over the rows of M} 
Set zl^^ = vff'^' 



if m^^fc = mj,fc then 

else 
Se 
end if 

Set F-F-c(4fc\zg|t,,fe,p,,fe). 
Set vff^<;;^ = /i(^S.^Sl*^fe-P^fc) and = 



Cpf _ direct 



Set z'^,^ = 



end for 
end for 
return F 



We argue by induction and start with i = n and k arbitrary in 1, . . . , i. 

It is ^ni = ^nlfc'" = F„_A;+i(a;„_fe+i) = i^mfc,fc(x,„^_J, and since m„,fc = 
m„,fc, it is z^^^ = v^"^;!}^^^_^_^ ^ fc(a;m„, J- Thereby, the statement is vahd 
for i = n. 

We assume that for all n > i > / for an / > 2, i.e., for all fc = i, . . . , 1 it is 



W'i.fc [{"ifc.fcm.+l, "l7i,fc} V'^'"i,fe l'^™fc,fc ' -^mi+l,*: ■ • ■ : -^mn.fc J- V"J 

If we proceed with step /, the algorithm selects zf'l — in Line |S1 By 

Equation ^ it is z'^/l = i^m,,,|{m,+i,,,...,m„.,}(a;™,,,Ja;™,^, . . . a;„^ J which 

proves that the algorithm selects the correct entry for z'f'l. 

By Definition l2.7[ Property (iii) we know that there exists a j in A;-|-l, . . . ,n — 
1 with 

(m/^fc, {r7i/+i,fc, . . . ,m„,fc}) e BmU) ^ BmU)- (9) 



direct 



and 



indirect 
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Let (x, D) G BM{j), then x and D consist of elements of the j-th column of 
M. Thus, maxja;, niaxZ?} = raj j. This is also true for {x,D) e BmU)- If we 
take the maximum over all elements on the left and right side of ([9]), it must 
hold that mj^k — ^j.j: and since mjj — n — j + l we know that j = n — + 
This explains the indexation of v in Lines [TU] and [T^l 

Now we distinguish between the cases (to/,/;, {mj+i^k, ■ ■ ■ ,TOn,fc}) £ BM{j) 
and ^ 

{mi^k, {nT'i+i,k, ■ ■ ■ ,mn,k}) e BmU)- For (m/,fc, {m/+i,fc, . . . , m„,fe}) £ BmU) 
it is 

(m/^fc,{rn/+i,fe,---,m-„,fc}) = (m^j, {rn/+ij, . . . ,m„j}) £ BmU)- (10) 

Hence, it follows w/^fe = rrijj = mj^k- Thus, it is m/.fe = m/.fc in LinelHlof the 
algorithm, and the algorithm defines Zj^l = vf^J!^^^^ ^^^^ = vfj""^^. Using the 
induction assumption ^ it follows 

and by dTO]) 

The argumentation for {mj^k, . . . ,TOri,*;}) G Bi\i{j) is similar. This 

proves the statement. 



^. 5. Inference of regular vines 

Having now established Algorithm 12.11 to evaluate a given R-vine copula 
density, the determination of the corresponding log likelihood expression L is 
straightforward by substituting Line [T] through "L = 0" and Line [14] through 
^^L — L + logc(Zj^"'^\ z-^2l*j,fcjPi,fe)"j and by returning L instead of F in the last 
line. The log likelihood can then be used, for example, for maximum likelihood 
estimation of the pair-copula parameters. 

For vines there is a second estimation procedure which is typically used in 
the literature, namely sequential estimation. This method exploits the tree by 
tree structure of vines by separately estimating the parameter(s) of each pair- 
copula in the first tree, then computing the transformed variables for the second 
tree using /i-functions, again separately estimating the conditional pair-copulae 
in the second tree, and so on. In doing so, only bivariate estimation is required 
and hence this method is quite fast. Moreover, the estimated parameters are 
typically good starting values for joint maximum likelihood estimation. 

With regard to Algorithm 12. 1[ this means that we only have to insert a new 
line before Line I14[ where the copula parameter pi^k is estimated based on the 
observations z|^2 and zfj^ and for copula family ti^k- 

Furthermore, sampling from R-vine specific ations can be p erformed using 
the inverse probability integral transform (see iDevrovd ( 19861 )). E.g., in the 
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Algorithm 2.2 Simulation of an R-vine specification. 

Input: R-vinc specification in matrix form, i.e., M, T, P, where mk,k = n ^ 
k + 1, k ~ I, n. 

Output: Random observations {xi, . . . , Xn) from the R-vine specification. 
1: Let ui, . . . ,Un be independent uniform samples. 
2: Let = (ffr'^*!?, k^l,...,n). 

3: Let y indirect ^ ^^indircct | - ^ k=l,...,n). 

4: Set «rs <rs' ■ • ■ , <r') = (^^i, «2, . . . 

5: Let M = {ini^k\i,k = l,...,n) with xiiik — maxjmi^fc, . . . , m„.fe} for all 
k — 1 , ... n — 1 and i = k, . . . ,n. 

..direct 



for A: = rt — l,...,ldo {Iteration over the columns of M} 
for i = fc + l,...,ndo {Iteration over the rows of M} 
if m.i^k = rrii^k then 
Set z':^^ = v'^r"^ 



7 

8 

9 
10 
11 

12: Set Z^^} = ^indirect 



else 

i,k ^i,(n— mi + • 



13 
14 
15 
16 
17 
18 

19 
20 
21 
22 



end if 

Set vfr' = /i"H<rs4fci^.,fc,p.,fc) 

end for 

™ . ^.direct 

Xn~k+1 - VnM 

for i — n, . . . , k + 1 do {Iteration over the rows of M} 

Set z|J = vf;r' 

Set vfl!^^^ = Kzl%z';^^\U,k.P^,k) and t;!'!^'^-* = Hzl^l zll^\U,k,P^,k). 
end for 
end for 

return (xi, . . . , a;„) 



bivariate case, let C be the copula under consideration and let vi and f 2 be two 
independent uniform samples. Using the inverse of the ^.-function as defined in 
u = (ui,U2)' given by 

ui = vi, and U2 = h~^{v2,ui) = F^^^{v2\ui), 

then is a sample from the copula C with uniform margins. 

This idea can be generalized to R-vines and the corresponding algorithm 
is given in Algorithm 12.21 where we again assume that entries of the R-vine 
matrix are ordered from ri to 1, and in particular the selection of the different 
0„- V and z.wj is the same as in Algorithm [2TTJ More details on this can be found 



in lPifimanrJ (|2010l . Section 5.3). 
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3. Selecting regular vine distributions 



Fitting an R-vine copula specification to a given dataset requires the follow- 
ing separate tasks: 

(a) Selection of the R-vine (structure), i.e., selecting which unconditioned and 
conditioned pairs to use. 

(b) Choice of a bivariate copula family for each pair selected in (a). 

(c) Estimation of the corresponding parameter (s) for each copula. 

Since all three steps are needed for an R-vine copula specification, one way of 
finding the "best" model is to accomplish steps (b) and (c) for all possible R- 
vine constructions. Since the number of possible R-vines on n variables increases 
very rapidly with n (n!/2 x 2( 2 ) as shown in Morales-Napoles et al. ( 2010l) ). 



this is not feasible. In addition to the fast growing number of possible R-vines, 
some methods to decide which bivariate copul a family to use depend on the 



interpretation of plots, e.g., K- or Chi-Plots (see lGenest and Favrd (|2007[ )'). and 
therefore need manual interaction. On the one hand, we do not use such methods 
to obtain objectivity and, on the other hand, this again is not feasible to do for 
every possible copula in every possible R-vine decomposition. In particular, in 
Section|3]we will fit a model to a 16-dimensional dataset leaving 120 copulae to 
select. This is not practicable to do manually. 

Therefore, we developed a sequential, heuristic method to select the tree 
structure of the R-vine. Since our proposed method for (a) depends on the 
copulae selected in (b) and estimated in (c) , copula selection is covered in Section 
13.21 A simulation study to evaluate our approach is presented in [Appendix A[ 

In Section 2] we will apply the techniques. 

3.1. Sequential method to select an regular vine copula .specification based on 
Kendall's tau 

To select one possible R-vine for a given dataset it is necessary to decide for 
which pairs of variables we want to specify copulae. We proceed sequentially, 
starting by defining the first tree Ti = (Ni,Ei) for the R-vine, continuing with 
the second tree, and so on. The trees are selected in such a way that the chosen 
pairs model the strongest pairwise dependencies present (more details below). 
Later, we will refer to this method as the sequential method. Since we examine 
every tree separately, it is not guaranteed to find a global optimum, where global 
optimum is meant in terms of model fit, e.g., higher likelihood, smaller AIC/BIC 
or superior in terms of the li keliho od-ratio based test for comparing non- nested 
models proposed bv lVuond (|1989[) . However, we think this sequential approach 
is reasonable because 

• the copula families specified in the first tree of the R-vine often have the 
greatest infiuence on the model fit. 

• it is more important to model the dependence structure between ran- 
dom variables that have high dependencies correctly, because most copula 
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families can model independence and the copulae distribution functions 
for parameters close to independence are very similar. 



• this approach minimizes the influence of rounding errors in later trees, 
which pairs with strong pairwise dependence are most prone to, e.g., when 
assessing the joint tail behavior of two variables. For pairs of variables 
close to independence, such issues are less relevant. 

• for real applications it is natural to assume that randomness is driven 
by the dependence of only some variables and not all. Therefore, if you 
choose the copulae with high dependence in the first trees, the transformed 
variables for the later trees will often be rather independent. We exem- 
plify this using the multivariate normal distribution, since we can easily 
compute conditional dependence for multivariate normal distri butions us- 
i ng we ll known properties of the normal distribution (see, e.g., AndersonI 



(120031)). 

For example consider the following three jointly normal distributed ran- 
dom variables. 




with pairwise correlations pi,2, Pi, 3 and P2,3- 

For the normal distribution we know that the correlation of Xi and X2 
given X3 can be calculated as following 

fV lY Y lY \ /51.2 - Pl,3P2,3 

Pl,2|3 := P[Xi\X3,X2\X3) = 



1 -Pi,3\/1 



Defining pi^s = p2,3 > Pi, 2 > we have pi,2|3 = (pi,2 - P?,3)/(l - Pi.s) < 
pi^2, since pi^2 < 1, and pi,2|3 > because of the positive-definiteness 
of the correlation matrix. Hence, if we fit the dependence for the two 
pairs with higher correlation first (assumption pi 3 = p2,3 > Pi, 2 > 0) the 
remaining correlation of Xi and X2 becomes smaller given X3. 

This is a desirable feature especially for datasets with a large number of 
variables, because we can truncate the R-vine specification and assume 
independence for the k last trees to reduce the num ber of parameters 



needed. For more information on this see Section 1^ and iBrechmann et al 



mm . 



We use Kendall's tau as a measure of dependence, since it measures de- 
pendence independently of the assumed distribution and hence, is especially 
useful when combining different (non-Gaussian) copula families. However the 
desc ribed method work s in the same way for every other measure of dependence 
(see Brechmann ( 2010l Chapter 3) for an extensive discussion). 
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Algorithm 3.1 Sequential method to select an R-vine model based on Kendall's 

tau. 

Input: Data {xn, ■ ■ ■ x^n), ^ = 1, N (realizations of i.i.d. random vectors). 
Output: R-vine copula specification, i.e., V, B. 
1: Calculate the empirical Kendall's tau Tj^k for all possible variable pairs 

{j,k},l <j<k<n. 
2: Select the spanning tree that maximizes the sum of absolute empirical 
Kendall's taus, i.e., 

max ^ |fj-,fe|. 

e—{j,k} in spanning tree 

3: For each edge {j, k} in the selected spanning tree, select a copula and es- 
timate the corresponding parameter(s). Then transform Fj\k{xej\xek) and 
Fk\j{xek\xej), ^ = 1, ---tN, using the fitted copula Cjk (see 

4: for i = 2, . . . ,n — 1 do {Iteration over the trees} 

5: Calculate the empirical Kendall's tau Tj k\D for all conditional variable 
pairs {j,k\D} that can be part of tree Ti, i.e., all edges fulfilling the 
proximity condition (see Definition 12. ip . 

6: Among these edges, select the spanning tree that maximizes the sum of 
absolute empirical Kendall's taus, i.e., 

max ^ It'jMdI- 

e—{j.k\D} in spanning tree 

7: For each edge {j, k\D} in the selected spanning tree, select a conditional 
copula and estimate the corresponding parameter (s). Then transform 
Fj\kuDixej\xek,xiD) and Fk\juDixik\xej,xeD), i = l,---,^, using the fit- 
ted copula Cjk\D (see 

8: end for 



Kurowickal (|201l[ ) proposes another method to generate R- vines. She builds 



the trees the other way around, starting with the last tree. By this method she 
tries to generate an R-vine with the lowest dependencies in the top trees. This 
method depends on the partial correlations which contradicts the fact that we 
want to use other, non-Gaussian copulae. Partial correlations are used, since 
they can be calculated without knowing the exact R-vine structure of the first 
trees. 

Our method is summarized in Algorithm 13.11 To select the tree that max- 
imizes the sum of absolute empirical Kendall's taus (Steps 2 and 6) we use 
a maximum spannin g tree (MST) algorithm such as the Algorithm of Prim 



(jCormen et al.l . I2001L Section 23.2). Typically such algorithms are described in 
a way to find a minimal spanning tree. But the algorithms work in both ways. 
Also note that in Steps 2 and 6 we are looking for a tree. We could look for a 
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star or a path instead, to obtain a C- or a D-vine structure, respectively. Note 
that for a D-vine a Hamihonian path has to found which corresponds to solving 
a Traveling Salesman Problem. This is however NP-equivalent and therefore 
rather inefficient to find a solution for, especially in higher dimensions. 

Notice that an MST algorithm does not depend on the the actual values of 
the edges, instead it only uses their rank. Therefore, the algorithm leads to the 
same results if we transform the edge values by a monotone increasing function. 
Hence, in our field of application, where we want to find a tree with maximal 
values of taus we would get the same tree even if we took other weights like 
squared taus or another monotone increasing transformation. 

How to select a copula, i.e., Steps 3 and 7 of Algorithm 13.11 is explained in 
more detail in Section [3T2l A proof that this algorithm creates an R-vine, i.e., 
that we always find a tree in Steps 2 and 6 and further explanations are given 
in the following. 

An MST algorithm always leads to a tree when the input graph is connected. 
Therefore, we need to check this assumption to verify our method. 

This is obviously true for Ti, since we start with a complete graph. When 
conducting the i-th step, we know that T^-i is a tree. The node set of tree Ti 
is then given by Ni = Ei-i. Let be the set of all possible edges in Ti (see 
Step 5 of Algorithm 13. ip . This edge set is defined by 

El^{{a,b}eNfmanb) = l}. (11) 

The requirement #(an&) = 1 ensures the proximity condition of an R-vine. To 
show that (Ni, E'i) is connected recall that connected means there is a path from 
every single node to every other node. Let a,b G Ni be arbitrary nodes. Further, 
let ni,n2 G -/Vi-i be two nodes from the previous tree with ni € a and n2 € b. 
Since ni and n2 are nodes of a tree, there is a path in Ti_i from ni to n2, 
rii G ei —>...—?> e; 9 712, ei, . . . , e; G I > 1. We know that ni € a and 

ni G ei. Without loss of generality we can assume that a — ei. Otherwise, if 
ei ^ a, we can extend the path 

ei+i = ei 

62 = ei 
ei = a 
/ = / + 1. 

Similarly we can assume b = ei. Since ei, . . . , e; induce a path, we know that 
#(ei n ei+i) = 1 for all i = 1, ...,/ — 1. Hence {a^ei+i} G E^ for all i = 
1, . . . , / — 1. Thus, we know that there is a path from ei — a to ei = b and 
(NijE'i) is a connected graph. Table [T] shows a concrete example of this idea. 

Finally, we give some more insight on how to calculate the empirical Kendall's 
taus and select copula families. Define E'^ like it was done in pT|) . For all 
e G E'^ we have to calculate the value of Kendall's tau, and for some of them 
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i Graph 



Description 



1 

Assume that we have 5 variables 
A^i = {1, 2, 3, 4, 5}. The first graph is 
always a complete graph, where we can 
connect every node with every other node. 
Let us assume the Algorithm of Prim 
selects the solid edges. The concrete edge 
values (Kendall's taus) are not of interest 
in this example. 



2 

All edges from the previous step are now 
nodes. An edge is drawn whenever the 
nodes share a common node in the 
previous tree (dashed and solid). We see 
that the graph is connected and select the 
tree indicated by the solid edges. 



3 

There are no options in this step. We need 
all edges to form a tree. Note, as soon as a 
graph has a D-vine structure, there are no 
more options in the following trees because 
they it uniquely determines all following 
conditioned and conditioning sets. 



Table 1: Exemplification of the model selection Algorithm [XT] 

(those selected in the MST) we need to fit a copula based on two conditioned 
variables. If e S e = {a, 6} connects variables xc^ ^ with xc^ ^ given the 
variables x , we hence need the transformed variables Fc^ a\D^{xCe al^D^) 
Pc^ b\Dc{xc^^t\xDj which are obtained as described in ([2]). For these it is then 
straightforward to calculate the empirical Kendall's tau and select a bivariate 
copula family as outlined in the following section. 

3.2. Selecting pair- copula families sequentially 

Besides the steps described above we need to select a copula family for every 
pair of variables. In the later application we take the following copula families 
into consideration (some properties are given in brackets): 

• Gaussian/Normal (tail-symmetric, no tail dependence), 

• Student-t (tail-symmetric, tail dependence). 
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• Gumbel (tail-asymmetric, upper tail dependence) and survival Gumbel 
(tail-asymmetric, lower tail dependence), 

• rotated Gumbel by 90 and 270 degrees (tail-asymmetric, no tail depen- 
dence) , 

• Frank (tail-symmetric, no tail dependence). 

In case of positive dependence this means that we can select among the Gaus- 
sian, Student-t, (survival) Gumbel and Frank copulae, while rotated Gumbel 
copulae can be used instead of Gumbel and survival Gumbel copulae when 
modeling negative dependence. Further, we will not use a Student-t copula if 
the maximum likelihood estimation leads to a degrees of freedom parameter 
higher than 30 because then the Student-t copula is too close to the Gaussian 
which can be used instead. 

Given these opti ons we still ha ve to decide which copula fits "best" . We do 



this using the AIC (jAkaikd . Il973f) which corrects the log likelihood of a copula 
for the number of parameters, i.e., the use of the Student-t copula is penalized 
compared to the other copulae, since it is the only two parameter family under 
consideratio n. Bivariate cop ula selection using the AIC has previously been in- 



consideratio n, ijivariate cop ula selection using tne Ai(J nas previously been in- 
vestigated in Manner ( 2007() and BrechmannI ( 2010l . Section 5.4) who found that 



it is a quite reliable criterion, in particular in comparison to alternative criteria 
such as copula goodness-of-fit tests. Selection proceeds by computing the AIC's 
for each possible family and then choosing the copula with smallest AIC. We will 
also include the independence copula in the selection by per forming a prelimi- 



nary i ndependence test based on Kendall's tau as described in lCenest and Favre 



(|2007l ). If this test indicates independence, no further steps are taken and the 
independence copula is chosen. 

Given the wide range of bivariate copula families available the above list 
of copulae clearly is not complete. For instance, we could also consider two 
parameter copula families such as the BBl or BB7 with different lower and 
upper tail dependence. These have previous ly b een used as building blocks o f 
C- and D-vine copulae bv lCzado et atl (l2010l) and lNikoloulopoulos et all (l2012l) . 



While already including copula families able to account for very different types 
of dependence, the above list can easily be extended by such families, which 
however increases the computational burden of the copula selection step. Using 
appropriate diagnostic tools for asymmetry and tail dependence as in the above 
two references, the required computational time can however be reduced. 



4. Modeling the residual dependency among daily returns of inter- 
national financial indices 

Copula based models are very commonly used in the area of multivariate 
modeling of financial returns. Here first appropriate marginal time series models 
are fitted to each financial return series and standardized residuals are formed. 
The dependency among these residuals is then modeled using a multivariate 
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copula after a transformation to marginally uniform data using either an empir- 
ical or parametric probability integral transformation. There has been empirical 
evidence that different asymmetric and tail dependencies are present for differ- 
ent pairs of variables, which cannot be captured using a multivariate Gaussian 
or Student-t copula with a common degree of freedom (see, amongst others, 
Longin and SolnikI (|l995l 2001) and lAng and BekaertI ^02)). Especially D- 



vines have been sh own to be very s uccessful in the model ing o f such depen- 
dency patterns (see Aas et all ( 20091) . Min and Czadol ( 2010[ ) and lMendes et al 



20101)), but also C-v i nes h ave recently been successfully applied (jCzado et al 



Mendes et al. ( 2010() however suggested that there should more research 



on how to choose D- vines including both the choice of the order of the nodes as 
well as how to choose the pair-copula families. This paper is exactly answering 
these questions and in our application we will investigate whether R-vine cop- 
ulae other than C- or D-vine and standard multivariate copulae are needed in 
modeling the residual dependencies among financial returns. 

For this we selected 16 international indices, including five equity, nine fixed 
income (bonds) and two commodity indices observed daily from 12/29/2001 un- 
til 12/14/2009 (2337 daily returns). All returns are unhedged against currency 
fluctuations and quoted in their home currency except for global indices which 
are stated in USD. In particular we choose the equity indices DAX, STOXX50, 
S&P500, MSCI- World and MSCI-EE, the fixed income indices IBOXX-G-3- 
5, IBOXX-G-7-10, IBOXX-E-1-3, IBOXX-E-5-7, IBOXX-E-IO-H, IBOXX-E- 
A, IBOXX-E-AA, IBOXX-E-AAA, IBOXX-E-BBB and the commodity indices 
Comm and Gold. For the bonds we selected maturities such that those of 
the German and the Euro bonds are disjoint, since German bonds (IBOXX-G) 
account for a large proportion of the Euro indices (IBOXX-E) giving rise to ex- 
tremely high Pearson correlations which are also observed between consecutive 
maturities (see the corresponding pairs in Figure |4] bel ow). More inform ation 
about the selected indices can be found in Table 6.13 ofloifi man 3 (l2010h . 

For the first step we fitted univariate ARMA(1,1)-GARCH(1,1) models with 
Student-t innovations using maximum likelihood estimation to all equity and 
commodity indices and Gau ss innovations for all bond indices, separate residual 
analyses in iDiBmann ( 2010l Section 6.3.1 and Appendix B.3) show no volatility 
clusters and a good fit of the chosen innovation distribution for equity and com- 
modity indices. For bond indices the innovation distributions are only reason- 
able. Corresponding Ljung-Box tests indicate independence of the standardized 
residuals. Since the sample size is large and there is always some uncertainty in 
the innovation distribution we selected the empirical probability integral trans- 
formation to obtain marginally uniform data. The resulting pair plots of the 
resulting copula data (top triangular matrix) and their estimated Kendall's tau 
values (lower triangular matrix) for six representatives from the different in- 
dices are given in Figure [3] indicating different strengths and signs of pairwise 
dependencies. 

For model selection we want to demonstrate the superior fit of R-vines with 
individually chosen pair-copula families and assess the gain over R-vines with 
only bivariate t or with only Gauss pair-copulae as well as over standard C- and 
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Figure 3: Pairs-plots and Kendall's taus for representatives of each index group. 



D-vines. In particular we apply the selection algorithm of Section [3] to select 
among five different R-vine classes given by 

• mixed R-vine: R-vine with pair-copula terms chosen individually from 
seven bivariate copula types (Gauss, Student-t, Gumbel, survival Gumbel, 
rotated Gumbel (90 and 270 degrees), Frank). 

• mixed C-vine: C-vine with pair-copula terms chosen individually from 
seven bivariate copula types (see above). 

• mixed D-vine: D-vine with pair-copula terms chosen individually from 
seven bivariate copula types (see above). 

• all t R-vine: R-vine with each pair-copula term chosen as bivariate 
Student-t copula. If the degrees of freedom parameter of a pair is es- 
timated to be larger than 30, we set the copula to the Gaussian. 

• multivariate Gauss: R-vine with each pair-copula term chosen as bi- 
variate Gaussian copula, i.e., this corresponds to a multivariate Gaussian 
copula, where unconditional correlations can be obtained from conditional 
ones by inverting a generalized version of Equation (j3.ip . 
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IBOXX.E.BBB 




Figure 4: Ti for an R-vine from the model selection algorithm. 



The top tree is common to all R-vines (in contrast to the C- and D-vines 
which are determined as maximal stars and paths as noted in Section [3]), since 
the selection of the top tree does not depend on the pair-copula choice (but 
only on the empirical Kendall's taus) and is given in Figure ID The structure in 
Figure H] reflects expected relationships among the residuals of the indices. The 
government bond indices are grouped so that consecutive maturities are con- 
nected. Similarly corporate bond indices are aligned according to their ratings 
from lowest (BBB) to highest (AAA). These two groups are connected by an av- 
erage representative, i.e., IBOXX-E-5-7 and IBOXX-E-AA. Since STOXX50 is 
a European equity index the residual dependency is highest to the predominant 
Euro bond index (IBOXX-G-3-5). 

For the copula family selection of each pair-copula term the AIC is used 
as described in Section 13. 2[ where pair-copula parameters are estimated by 
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maximum likelihood estimation. In applying the selection algorithm we also 
observed that empirical Kendall's tau values tend to be small for higher order 
trees. In this cases it might be sufficient to replace the corresponding pair- 
copula term by the independence copula. Therefore we also fitted an R-vine 
using the preliminary independence test based on Kendall's tau for each pair 
("indep. R-vine"). If the p- value of the test is larger than 5%, then we choose 
the independence copula for this pair-copula term. The issue o f large numbers of 
indepe ndence copulae in later trees is further investigated in iBrechmann et al 



( 20121 ) who call an R-vine truncated if all pair-copulae in higher order trees are 
set to bivariate independence copulae. 

Applying the selection procedure to the R-vine mixed case 16 Gauss, 
51 Student-t, 4 Gumbel, 7 survival Gumbel, 12 rotated Gumbel and 30 Frank 
bivariate copula terms requiring 171 parameter estimates were chosen. If the 
choice for a pairwise independence copula is allowed, the total number of param- 
eters was significantly reduced to 108, since 55 copula terms were replaced by 
an independence copula. These models correspond to the mixed/t scenario of 
the simulation study in [Appendix A and hence we can assume that our models 
give rather adequate fits compared to the (unknown) "true" model. 

Selection results for all models are summarized in Table [5J It shows the log 
likelihood achieved for sequential estimates in the first row, while the second 
row gives the log likelihood after joint optimization of the chosen regular vine 
tree specification and copula types (see Section [^75]) . The next rows indicate the 
number of pair-copula types chosen and the final rows give the test statistics 
together with the p-values in parentheses of a Voung test with and without 
Akaike and Schwarz corrections, respectively, testing the R-vine mixed model 
against the alternative indicated by the respective column. This shows that the 
sequential log likelihood is quite close to the one obtained by joint maximization 
for all model classes considered. Especially the top four ranks are maintained. 
We also observe only small differences in the parameter estimates. The non-zero 
number of (survival/rotated) Gumbel pair-copula terms shows non-symmetric 
heavy tailed conditional dependencies present in the residual data. From the 
Vuong tests we see that the mixed R-vine is to be preferred over the mixed 
D-vine and the multivariate Gaussian copula. The difference to the all t R-vine 
and to the mixed C-vine is also more pronounced when using the (parsimonious) 
Schwarz correction, the mixed R-vine model is marginally superior in that case. 
The choice of Gaussian copulae for Student-t copulae with too many degrees of 
freedom means that the number of parameters in the all t R-vine is still close to 
that of the mixed R-vine. If we chose Student-t copulae for all terms, the number 
of parameters would be 240 and hence the influence of the corrections for the 
number of parameters used would be stronger. Finally, the mixed R-vinc model 
reduced by independence pair-copula terms is preferred over the non-reduced 
mixed R-vine model if a Schwarz correction is used, since the reduced model 
has significantly less parameters to be estimated. 

Overall this example demonstrates the usefulness of R-vine copulae with 
individually chosen copula types for each pair-copula term. In addition the R- 
vine tree selection procedure gives directly economically interpretable results for 
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Table 2: Log likelihoods, numbers of parameters and of copulae for all models as well as results 
of the Vuong tests (test statistics and p-values in parantheses) comparing the R-vine model 
with mixed copulae to all other models. The positive values of Vuong test statistics indicate 
that the test favors the R-vine model over the respective alternative model (inconclusive region 
at the 5%-level: [-1.96, 1.96]). 



this data set. 

A note on the required computing time: In our implementation the sequen- 
tial selection and estimation Algorithm 13 . 1 1 took only between 5 minutes for the 
reduced mixed R-vine model and 9 minutes for the mixed C-vine on a Linux 
cluster computer with 32 processing cores (AMD Opteron, 2.6Ghz). In contrast 
the maximum likelihood estimation was computationally much more demand- 
ing. While the computing time for the non-reduced mixed R-vine model was 
only 1.5 hours, it increased to about 9 hours for the all t R-vine and the mixed 
C- and D-vine models. 

5. Summary and discussion 

This paper provides a significant contribution towards making R-vine copu- 
lae a standard building block for copula based models. While already the intro- 
duction of C- and D-vine copulae provided flexibility in modeling dependencies, 
R-vine copulae provide even more modeling capabilities. Before the availabil- 
ity of such pair-copula constructions for multivariate copulae, the choices were 
rather limited. With R-vine copulae together with different choices for individ- 
ual choices of copula types for each pair-copula term, the problem of too few 
modeling choices has shifted to the problem of too many choices to be investi- 
gated. 
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In this paper we provided a general selection approach to sequentially choose 
the tree representation together with choosing the copula type for each copula 
term from a large class of bivariate copula families and estimate the corre- 
sponding parameters. The selection approach involves sequentially the use of 
any graph theoretic algorithm which finds a maximum spanning tree. Absolute 
empirical Kendall's tau values are used as weights, but other weights are possi- 
ble. In finance the use of empirical tail dependence or other measures of joint 
tail behavior might be useful to investigate. 

The output of the selection procedure gives an R-vine tree structure, their 
corresponding pair-copula types and parameter estimates. These so-called se- 
quential estimates can be us ed as starting values for determining the maximum 
likelihood estimates (see also lHobaek Haff ( 2011 ) for more details on the asymp- 
totic behavior of these estimates). The paper also uses a matrix representation 
of an R-vine and provides a novel algorithm to evaluate the joint density for 
any arbitrary R-vine copula. The selection procedure is completely operational, 
it is implemented in the statistical software R and is capable to handle medium 
sized dimensions of up to 20 dimensions. 

As noted in Section |4] it might be worthwhile to replace pair-copula terms 
by independence copula terms or simpler copula type choic es in higher order 
trees. This issue has been investigated in the related work by iBrechmann et al" * 



( 2012h who developed testing procedures to determine truncation after a certain 
tree. This further balances the model fiexibility with the desired parsimony 
of the model and opens R-vi nes to applications in large dimensions (see also 
Brechmann and Czadol(|201lh 'l. 



In future, we will also investigate the model selection problem described in 
Section [3] more closely. This includes the choice of other weights than Kendall's 
tau as well as the selection of C- and D-vines. In particular, the selection of 
the order in the first D-vine tree corresponds to a Traveling Salesman Problem 
and therefore is NP-equivalent. Here, tailor-made approaches for the D-vine 
methodology have to be considered. 
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Appendix A. Simulation study 



In order to evaluate the approach of sequentially selecting and estimating R- 
vines proposed in Section |31 we set up a comprehensive simulation study based 
on the R-vine shown in Figure[T] In total we simulated samples of size 500, 1000 
and 2000 according to twelve different scenarios, i.e., twelve different choices of 
pair-copula families and parameters. We repeated this 1000 times each. The 
considered scenarios are: 

• all Gaussian, all t, all Gumbel and all Prank R-vines: all pair- 
copula families are chosen as Gaussian, Student-t, Gumbel and Frank 
copulae, respectively. Degrees of freedom of the Student-t copula are 
linearly increased by 1 for pair-copula terms in higher order trees and 
start with 3 in the first tree. 

• mixed R-vine: different families for each pair-copula term. 

• t/mixed R-vine: Student-t copulae for pair-copulae in first two trees, 
mixed copulae for remaining pairs. Degrees of freedom of the Student-t 
copulae are also mixed. 

In each of these scenarios, parameters are chosen according to two different 
settings of Kendall's taus (first, constant values per tree except for increased 
values of the "central" copulae 02,3, c^^ and C2fi\3, and second, mixed values; 
see (|A.ip and (|A.2|) . respectively) so that we end up with twelve scenarios. While 
the R-vine structure matrix is given by (U) , corresponding matrices of Kendall's 
tau values as well as of copula types for the mixed and t/mixed R-vines are 
shown in [Appendix A.T] below. 

Having simulated from the respective true model, we sequentially select and 
estimate by maximum likelihood estimation an R-vine model as described above 
and determine the following three quantities to evaluate the adequacy of our 
selection and estimation approach: 

• general tau-difference: we compute the mean absolute difference be- 
tween pairwise empirical Kendall's taus of simulated data from the true 
and from the selected models. The mean over all repetitions is reported. 

• lower and upper tau-difFerence: similarly we compute the mean ab- 
solute difference between pairwise empirical lower and upper exceedance 
Kend all 's taus which are defined for two variables Ui and U2 as (|Brechnianrl 
20inl Section 3.1.3) 



t'°"'^''(C/i, U2) := t{Uu U2\Ui < 6i, U2 < 62) 
T^PP-^Ui, U2) := t{Uu U2\Ui >l-5i,U2>l- 62), 

and measure the strength of the joint tail behavior of Ui a nd 1/2. As 
thresholds 61 and 62 we choose 5i —62 = 0.2 as recommended bv lBrechmann 
( 2010t ). Again the means over all repetitions are reported. 
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o 
o 


all t 


0.038 


0.009 


0.037 


0.040 


0.010 


0.039 


o 


all Gumbel 


0.047 


0.009 


0.040 


0.048 


0.010 


0.042 


II 


all Frank 


0.062 


0.008 


0.062 


0.058 


0.008 


0.058 


mixed 


0.049 


0.013 


0.050 


0.048 


0.013 


0.048 




t /mixed 


0.039 


0.010 


0.039 


0.041 


0.010 


0.041 



Table A. 3: Results of the simulation study. The second column indicates the respective 
scenario for sample sizes of N = 500, A'^ = 1000 and N = 2000. The results corresponding to 
the first setting of Kendall's tau values are shov^n in columns 3-5, while those for the second 
setting are displayed in columns 6-8. 



The results of the simulations are shown in Table IA.3I and can be summarized 
as follows. 

In terms of all three criteria, the performance improves with increasing sam- 
ple size due to a higher estimation accuracy and the smaller simulation error. 
Across both settings of parameters (chosen according to Kendall's tau values), 
the performance is very similar and only slightly worse in the case of mixed 
Kendall's taus. According to the general tau-difference criterion, the (non-tail 
dependent) all Gaussian and all Frank R-vines are identified best. The criteria 
based on exceedance Kendall's taus show that the all t and the t /mixed R-vines 
as well as the upper tail of the all Gumbel R-vine are accurately modeled. That 
is our selection and estimation approach appropriately takes into account the 
characteristic properties of the copula models. 

Comparing the all t, the t/mixed and the mixed scenarios, it is evident 
that models with larger numbers of Student-t copulae (combined with mixed 
copulae) can be identified very well. This is in particular true when Kendall's 
tau values are mixed, which is typical for practical applications. 
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Appendix A.l. Setting of the simulation study 

In the following we show the matrices of Kendall's tau values for parameter 
choice in the above simulation study as well as the copula type matrices for 
the mixed and t/mixed scenarios. First, the two settings of Kendall's taus are 

specified as follows. 

• Constant Kendall's taus per tree: 
/ 
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0.10 












'Tconst — 
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0.15 
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Mixed Kendall's taus: 
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0.10 












'^mixed — 


0.15 
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0.20 


0.20 
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0.40 
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\0.50 
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0.60 


0.65 


0.70 


0.75 
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(A.2) 



Using abbreviations for copula types (A''=Gaussian, i=Student-t, G=Gumbel, 
SG=Survival Gumbel, F=Frank) the copula type matrices of the mixed and 
t/mixed scenarios are given by: 

• mixed R-vine: 



t/mixed R-vine: 
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